Sensor Clustering Using a K-Means Algorithm in Combination with Optimized Unmanned Aerial Vehicle Trajectory in Wireless Sensor Networks

We examine a general wireless sensor network (WSN) model which incorporates a large number of sensors distributed over a large and complex geographical area. The study proposes solutions for a flexible deployment, low cost and high reliability in a wireless sensor network. To achieve these aims, we propose the application of an unmanned aerial vehicle (UAV) as a flying relay to receive and forward signals that employ nonorthogonal multiple access (NOMA) for a high spectral sharing efficiency. To obtain an optimal number of subclusters and optimal UAV positioning, we apply a sensor clustering method based on K-means unsupervised machine learning in combination with the gap statistic method. The study proposes an algorithm to optimize the trajectory of the UAV, i.e., the centroid-to-next-nearest-centroid (CNNC) path. Because a subcluster containing multiple sensors produces cochannel interference which affects the signal decoding performance at the UAV, we propose a diagonal matrix as a phase-shift framework at the UAV to separate and decode the messages received from the sensors. The study examines the outage probability performance of an individual WSN and provides results based on Monte Carlo simulations and analyses. The investigated results verified the benefits of the K-means algorithm in deploying the WSN.


Introduction
Deployments of wireless sensor networks (WSNs) are increasing because of their beneficial applications. For example, WSNs can be deployed to monitor or collect environmental data (meteorological information such as precipitation, wind speed and direction, air pressure, humidity, temperature, etc.) in remote or difficult terrain [1][2][3][4][5][6]. A major challenge in deploying a WSN is distributing a large number of wireless sensors over a large and complex geographical area. Wireless sensors are generally low cost, have low power consumption and are highly flexible in their application. However, transmitting a signal from a wireless sensor directly to a control centre presents an important challenge [7], especially if a large number of wireless sensors are deployed to directly collect data. Using terrestrial infrastructure for the purpose of collecting data from wireless sensors is impractical because of high deployment costs and a low flexibility. A potential solution to this problem is using an environmental monitoring system which dispatches an unmanned aerial vehicle (UAV) to a geographical area to retrieve the collected sensor data.
To deploy a WSN, Heinzelman et al. [8] proposed a low-energy adaptive clustering hierarchy (LEACH), a clustering method now considered the most well-known clustering protocol for WSNs. In a hierarchical topology, clusters contain two types of node: cluster members and cluster heads. Member nodes are grouped into different clusters, and in each cluster, a single node is designated a cluster head. The cluster head has the most important role in the cluster, tasked with receiving signals from cluster members and forwarding those signals to other cluster heads [9] or the base station [10].
UAVs have gained increasing consideration as aerial relays which deliver mobility and on-demand wireless connections in areas with complex topography and no network coverage. A UAV's online time, however, is limited by its own on-board energy limitations. The evolution of UAV-assisted WSNs is compelling the scientific community to search for new ways of performing energy harvesting (EH) from external power sources to prolong the online time of UAVs. A variety of effective solutions have been proposed, grouped according to two main types of technique, namely, simultaneous wireless information and power transfer (SWIPT) and techniques for determining the optimal positions for the UAV.
Radio frequency EH shows promise as a potential solution for UAV-assisted WSNs. Initial studies on radio frequency EH were used in a technology termed wireless power transfer (WPT) to recharge the wireless sensors in the WSN. A new radiofrequency EH technique, termed SWIPT, introduced significant benefits to WPT [11]. Many authors have studied SWIPT over the last two decades [12][13][14][15], investigating the performance difference between time switching and power splitting in SWIPT protocols [16]. The current study applied a time-switching protocol because it supports a phase for EH.
Applied to all beyond 5G/6G wireless communications, nonorthogonal multiple access (NOMA) provides massive connections, low latency and high reliability [14,[17][18][19][20][21]. In the current study, we took advantage of NOMA's benefits, applying NOMA at the UAV to superimpose coding of the data received from wireless sensors and to forward this superimposed signal to a mobile data centre.

Motivation
The use of machine learning in practical applications is escalating. The authors in [6] applied an artificial neural network (ANN) for sensor clustering. By contrast, some wireless sensors in the study in [6] were clustered as separate single-member clusters. In [22], the authors proposed a distance-and energy-constrained K-means clustering scheme (DEKCS) for cluster head selection to prolong the lifetime of underwater WSNs. With this new clustering algorithm, a prospective cluster head was selected according to its position in the cluster and its residual battery level. The authors dynamically updated residual energy thresholds set for prospective cluster heads to ensure that the network fully depleted its energy before disconnection. In this manner, cluster heads could be drained of energy and become inactive/dead sensors. The current study applied the K-means algorithm and the gap statistic method first introduced in [23] to obtain an optimal number of subclusters. To our best knowledge, the gap statistic method has not been applied for WSN clustering in any previous study.
In [24], the authors examined a UAV-assisted data collection WSN. The UAV's trajectory was optimized by applying the travelling salesman problem. Note that in [1], the UAV in the proposed network visited every wireless sensor, while in [24], the optimal serving order for sensors was determined according to a standard travelling salesman problem algorithm, which can be optimally solved with the efficient cutting-plane method (i.e., the shortest path from the start point to the end point). The authors also proposed an algorithm which used the pattern search method to solve the problem of optimizing the UAV position and sensor uploading power. In [1], the UAV could be exhausted as a consequence of long flight distances. The authors in [24] used a UAV that navigated the shortest path from the start point to end point, but it consequently ignored/missed some wireless sensors. In another study, the authors addressed the UAV's trajectory problem by jointly optimizing the UAV's velocity, hovering positions and visiting sequence [25]. The scientific community is very interested in studying UAVs' trajectories for the significant potential gains in aerial network performance. Researchers have applied several types of trajectory, for example, straight trajectory [26,27], circular trajectory [10,28,29] and spiral trajectory [30][31][32]. The authors in [25] introduced an interesting UAV trajectory scheme (Figure 4), where the UAV visited all N monitoring areas and then found suitable positions to transmit the collected data. In [25], the UAV collected data in four stages: (i) UAV data collection flight, (ii) UAV data collection processing, (iii) UAV data transmission flight and (iv) UAV data transmission processing. That UAV's operating schedule is illustrated in Figure 5 [25].
The current study proposes the use of a UAV for its high mobility, quick implementation and low cost. The main drawback to a small-sized, lightweight aircraft such as a UAV, however, is its limited on-board energy. UAVs are therefore not suitable for flying close to each sensor to collect data, as proposed in [25]. The current study therefore investigated the application of a K-means algorithm to cluster wireless sensors into multiple, optimized subclusters.

Contribution
Inspired by the studies mentioned in the previous section, we employed a UAV as an aerial relay to provide a sustainable, functional solution for a WSN. The main contributions of the current study are: • The use of three-dimensional Cartesian coordinates for a WSN which contains a random number of randomly distributed wireless sensors. • The decomposition of the UAV trajectory optimization into two subproblems: (i) the global WSN cluster is divided into multiple subclusters whose number is optimized with unsupervised machine learning which applies K-means clustering in combination with the gap statistic method; (ii) a centroid-to-next-nearest-centroid algorithm is then applied to find the shortest path for travel through every subcluster. • An analysis of the system performance of the WSN over Rayleigh distributions and a presentation of the derived closed-form expressions for the outage probability at the UAV and mobile base station. • Outage probability results for the UAV and mobile base station derived from Monte Carlo simulations and verified with an analysis.
The remainder of the paper is organized as follows: Section 2 introduces the WSN model, wireless sensor clustering algorithm, joint UAV trajectory, free-space channel modelling, and joint UAV operating schedule; Section 3 provides an analysis of the WSN's performance based on outage probability and presents the closed form expressions for outage probability at the UAV and mobile base station; Section 4 examines and plots the investigated results; Section 5 discusses conclusions.
For clarity, Table 1 presents the notation used in the paper.
Number of antennae at the sensors, UAV and mobile base station Visited cluster set Updated after the UAV visits the centroid of a subcluster C k as given byC ←C ∪ C k H S n ,U , H U,B Precoding fading channel matrices from sensors to the UAV and from the UAV to the mobile base station Power allocation factor for sensor S n , indexed ith in Respective power domains at the sensors, UAV and mobile base station B Let P S 1 = . . . = P S N dB R Predefined bit-rate threshold for sensors bps/Hz n of sensor S n is decoded SIC decodes the message with the biggest power allocation factor by treating other messages and AWGN as interference Instantaneous bit rate reached at UAV U and mobile base station B when message x S (i,k) n of sensor S n is decoded bps/Hz Outage probabilities at UAV U and mobile base station B 0 ≤ OP U ≤ 1, 0 ≤ OP B ≤ 1, a lower outage probability result is better performance

WSN Model
The current study examines a general WSN with a randomly distributed number of wireless sensors. Figure 1 depicts a WSN with a random number of sensors N = 42 positioned at the Cartesian coordinate (x, y, z) in three dimensions. Let us assume that a mobile base station B is positioned at B(0, 0, 0) and each wireless sensor S n for n = {1, . . . , N} is positioned randomly at coordinate S n (x, y, 0), where x = {0.1, . . . , 1} and y = {0.1, . . . , 1} as shown in Table 2. For simplicity, we assume that the wireless sensors and mobile base station are positioned relative to a flat earth. Definition 1. We denote the global set C as containing all wireless sensors. |C| returns N, the total number of wireless sensor nodes (i.e., |C| = N). Let us assume that the data observations (i.e., wireless sensor positioning) are clustered into K subclusters, i.e., C ⊇ C 1 ∪ . . . ∪ C K and Figure 1 illustrates a random distribution of wireless sensors. Each wireless sensor is allocated a given index by the subscript n, where n = {1, . . . , N} and a lower index n has a higher priority. For clarity, sensor S n has a higher priority than sensor S n+1 (e.g., sensor S 1 has a higher priority than sensor S 2 ).   Note: Two or more wireless sensors will never occupy the same position; each position is allocated only one wireless sensor, as illustrated in Figure 1.

WSN Clustering
Remark 1. Because the optimization problem is complex, we propose breaking it down into several subproblems and observing the random distribution of wireless sensors over a large geographical area, as illustrated in Figure 1. The global wireless sensor cluster can be divided into multiple subclusters. The number of subclusters can be optimized by applying the gap statistic method, and the wireless sensors can be assigned to a subcluster using the K-means algorithm. To solve these problems, we propose a solution in Proposition 1.

Proposition 1.
The optimal number of subclusters is yielded as follows: • Observing the latitudes (x-axis) and longitudes (y-axis) of the wireless sensors, we determine the optimal number of subclusters K ← k optimal . The gap statistic method is applied to the number of subclusters k to compute the corresponding total within the intracluster variation W k , i.e., the sum of squares function, given by where κ = {1, . . . , k}, |C κ | returns the number of wireless sensor nodes in cluster C κ and d S i ,S j is the squared Euclidean distance of all pairwise sensor nodes in the cluster C κ for S i , S j ∈ C κ and i = j. It is important to note that we assume k min = 4 and k max = |C| , where |C| is the number of observations (or the number of sensors within the global cluster C). Let us briefly consider factors k min and k max . We define k min = 4 to prevent a uniform distribution of wireless sensors positions throughout the area; the gap statistic method thus returns the optimal number of clusters k optimal = 1. For example, Figure 2a,b in [23] plots the distribution of sensors spread throughout a region and the corresponding optimal number of clusters at K = 1, respectively. However, we also define k max = N k min to prevent each wireless sensor owning a private cluster, a problem that would lead to a UAV visiting every wireless sensor to collect data.
• Reference data sets Ω with a random uniform distribution are generated. Each reference data set ω of these reference data sets Ω is clustered with a variable number of clusters k = {k min , . . . , k max }. The corresponding total is computed within the intracluster variation W κω given in the dispersion metrics for κ = {1, . . . , k} and ω = {1, . . . , Ω}.

•
The estimated gap statistic is computed as the deviation of the observed W k value from its expected value W κω under the null hypothesis Gap(k) = 1 log(W κω ). The standard deviation (sd) of the statistics is then computed, given by • Using the gap statistic method, the smallest value of κ is selected as the optimal number of clusters, the gap statistic being within one standard deviation of the gap statistic at κ + 1, For example, Figure 2 indicates the optimal number of clusters at k optimal = 4, determined by the gap statistic algorithm according to the randomly positioned sensor nodes shown in Figure 1.
The K-means algorithm was used to calculate the position for each centroid, with an optimal number of clusters K ← k optimal . The computed centroids of four subclusters (K = 4) are listed in Table 3. Figure 3 illustrates all wireless sensor nodes after clustering to K subclusters. After clustering, each sensor node is grouped into a subcluster; for example, S 3 1 indicates that sensor S 1 is a member of subcluster C 3 ( Figure 3).  Cluster C 1 Cluster C 2 Cluster C 3 Cluster C 4 Figure 3. Sensor clustering using the K-means algorithm, with optimal number of clusters K = 4.

Joint UAV Trajectory
Definition 2. We introduce a novel joint UAV trajectory algorithm to compute the centroid to next nearest centroid.

Problem 1.
Operating as a flying relay, the UAV has the advantage of a high mobility and is able to fly close to wireless sensors to receive and forward a superimposed signal. This, however, leads to a long flight path, and the other wireless sensors must wait to be served. We minimized the flight time/path of the UAV according to the cluster centroid positions shown in Table 3. How to obtain the shortest flight path is outlined in Proposition 2.

Proposition 2.
The centroid-to-next-nearest-centroid trajectory was computed as follows: • Step 1: To determine the nearest centroid from the mobile base station B, we calculate the smallest pairwise Cartesian distance from the mobile base station to each subcluster centroid.

•
Step 2: The UAV selects the next nearest cluster centroid. In this case, the UAV considers candidate centroids without regard to any of the previously selected cluster centroids inC. It is important that the centroids contained in the visited setC be removed from the candidate list to prevent the UAV returning to the previous subclusterC. The UAV repeats Step 2 (i.e., C\C = ∅) until the list of candidate subclusters is empty (i.e., C\C = ∅).
Without loss of generality, we examined a single round trip of the UAV. Table 4 lists the next nearest subcluster centroids determined from the above selection strategy. The results in Figure 3 indicate that subcluster C 1 was the nearest to the mobile base station B compared to the other subclusters. Subcluster C 1 was therefore selected at block period time T = 1. The UAV visited subcluster C 1 first to collect data from all sensor members in subcluster C 1 . The visited setC ←C ∪ C 1 was then updated. After all data from the sensor members in subcluster C 1 were collected, the UAV selected subcluster C 3 because it contained the next nearest subcluster centroid. The UAV then visited subcluster C 3 at global time period T = 2 to collect data from all sensor members in the subcluster. The visited setC ←C ∪ C 3 was again updated. The UAV continued to follow this procedure, selecting the next nearest centroid and updating the visited set, until all data have been collected from each subcluster. In this manner, the UAV followed the shortest possible flight path, as shown in Figure 4. After travelling through all K subclusters (C ≡ C) and collecting all data from wireless sensor members in each subcluster, the UAV's task was complete and it returned to the mobile base station. For the real-time application of a UAV-assisted WSN, the UAV would repeat the round trips summarized in Table 5. Table 4. Pairwise centroid-to-centroid distance based on Cartesian distances.
Observing Table 4, notice that numbers with inclined lines (e.g., 0) and numbers with bold (e.g., 0.5005) mean visited clusters and next nearest clusters. For clarity, when UAV visited cluster C 1 (row with C 1 ), the UAV selects the next-nearest cluster (i.e., C 3 ) and ignores cluster C 1 . Next, the UAV visited cluster C 3 (row with C 3 ), the UAV selects the next-nearest cluster (i.e., C 4 ) and ignores clusters C 1 and C 3 . The remaining rows in Table 4 have the same meaning. . Joint UAV trajectory and the shortest path based on the centroid-to-next-nearest-centroid distance given by Algorithm 1 (i.e., C 1 → C 3 → C 4 → C 2 ). Algorithm 1 K-means clustering for the optimal number of subclusters and shortest path determined from a centroid to the next nearest centroid Input: Generate a wireless sensor network with a number N of randomly positioned wireless sensors; Output: An optimal number of subclusters K and subcluster centroids; 1: Initialize variables k min = 4, k max = N k min ; 2: Attempt ∀k = {k min , . . . , k max } to find the optimal number K of subclusters, computed according to Proposition 1; 3: Find the centroid positions for K subclusters by applying K-means clustering; 4: Compute the pairwise distances between the mobile base station B and subcluster centroids; 5: Select the nearest centroid and updateC; 6: while C ∩C = |C| do 7: Compute the pairwise distances between the current centroid and other centroids; 8: Select the nearest centroid and then updateC. 9: end while 10: return the number of subclusters K, centroid positions C k (x, y) for k ∈ K and the shortest path.

Channel Modelling for a UAV-Assisted WSN
In our previous work [33], we considered free space (i.e., air-to-ground (A2G), groundto-air (G2A) and air-to-air (A2A)) and first introduced the flat-earth distance based on real latitudes and longitudes. The proposed solutions were effective in determining and tracking the optimal positions for the UAV. A separate study [33] examined the problems related to channel modelling in a WSN which contained multiple subclusters. In the current study, we address the uplinks, i.e., the channels from the wireless sensors to the UAV (H S n ,U ) and the channels from the UAV to the mobile base station (H U,B ). The precoding channel matrices H S n ,U and H U,B are expressed by where A S n and A U are the number of antennae on the wireless sensor S n and UAV U, respectively; the channel coefficient h where g is the Rayleigh fading channel, ε is the path-loss exponent, and d G2A S n ,U is the G2A distance from the sensor node S n to UAV U. Note that the free-space distance based on latitude and longitude is given by expression ( [33], Equation (2)). For simplicity and without loss of generality, all wireless sensors are allocated with Cartesian coordinates in a three-dimensional space. The G2A distance from wireless sensor S n to UAV U is therefore given by d where the x, y and z axes represent latitude, longitude and altitude, respectively, on a flat earth.
Similarly, the precoding channel matrix H U,B is expressed as where A B is the number of antennae at the mobile base station, and the channel is the A2G distance from the UAV U to the mobile base station B and given by

UAV Joint Schedule
This study introduces a novel scheduling protocol for a UAV-assisted WSN. The coefficient t is the time required to complete a transmission cycle of three phases, i.e., λ 1 , λ 2 and λ 3 , where λ 1 is the first phase during which the UAV receives signals from the sensor nodes in a cluster, λ 2 is the second phase during which the UAV receives radiofrequency energy from the mobile base station, and λ 3 is the third phase during which the UAV transmits the superimposed signals to the mobile base station for data analysis. Figure 5 depicts an electronic control unit (ECU) which performs a task corresponding to a predefined operation in a common UAV schedule. The ECU implements an electronic switch which applies three successive modes during the same transmission block t:

•
In phase λ 1 , the interface of the receiving signal circuit is active while the other interfaces are inactive. The UAV receives signals from the sensor nodes in the currently visited subcluster, given by (5). • In phase λ 2 , the interface of the EH circuit is active while the other interfaces are inactive. The UAV receives radiofrequency energy from the mobile base station B, given by (10), while the ECU decodes the messages from the signals received from the wireless sensors. • In phase λ 3 , the interface of the transmitting signal circuit is active while the other interfaces are inactive. The UAV encodes the messages received in the first phase and forwards the superimposed signals to the mobile base station B, given by (11).

Phase 1: Uplinks between Wireless Sensors and the UAV
In the first phase, having selected the next nearest subcluster centroid, the UAV visits and hovers at the selected centroid and receives signals wirelessly from the subcluster's sensors. Figure 6 illustrates the procedure of receiving signals and processing data at the UAV during the first phase. Based on the number of subclusters K and the global period T, we calculate the UAV period t as follows: where the mod{T, K} function refers to the modulo value between T and K (e.g., for T = 10 and K = 4, t = mod{T, K} = 2). At each UAV period t, the UAV selects the next nearest centroid using the centroid-to-next-nearest-centroid algorithm. According to the trajectory mapped in Table 5, for T = 10, K = 4 and t = 2, the UAV selects the subcluster C 3 and serves wireless sensors S . The signals received from the wireless sensors in subcluster C k over UAV period t are given by where t is obtained from (4), k is mapped as in Table 5, and P S n is the transmit power of sensor S n . For simplicity, we assume that P S 1 = . . . = P S N . Let us denote D(T, N, K), which is the mathematical description of a diagonal matrix, as follows: where the diagonal matrix D has the size |C k | × |C k | for transmission period T and all nondiagonal elements are zero, as indicated in Figure 6. The predecoded matrix obtained at the UAV is derived by multiplying the received signal matrix (5) with the diagonal matrix (7); thus, preDecode(T, N, K) = y S (i,k) n (T, N, K) × D(T, N, K). The UAV selects each element in the predecoded matrix to obtain the SINR at the point when the UAV decodes message x S n from sensor S n ∈ C k , as follows: where the signal-to-noise ratio (SNR) ρ S n = P S n N 0 . For simplicity, we assume that We then obtain the instantaneous bit rate at the point when the UAV decodes message n , as follows: The most challenging aspect of deploying a UAV is managing its power limitations as a small, lightweight aircraft. We propose applying SWIPT techniques to prolong the UAV's online time. In a previous study [33] (Figure 4), we adopted a power splitting protocol. In the current study, we applied a time-switching technique for its advantages in a WSN ( Figure 5); the technique is different from the proposed time-switching models in [33] ( Figure 4). In phase λ 2 , the UAV harvests radiofrequency energy from the mobile base station according to where P B is the power domain at the mobile base station, and σ B,U is the expected channel gain between the mobile base station and the UAV at its current UAV position. It is important to note that η is the collected energy factor and that we assume η = 1 for simplicity.

Phase 3: Transmitting Signals
In [10], the authors applied the amplify-and-forward protocol at the UAV to receive and forward signals to a single device. In the current study, we implemented a decodeand-forward protocol at the UAV to ensure that the UAV received, decoded and encoded messages successfully before forwarding the superimposed signals to the mobile base station. To improve latency, we applied the emerging NOMA technique for its high spectral efficiency. The UAV U encoded the messages ∀x S n (i,k) ∈ X k from the sensors in the current subcluster C k and superimposed them into the signal by sharing the power domain P U and using different power allocation factor α (i,k) S n . From the precoding matrix H U,B , as given by (3), only the best channel was selected for signal transmission.
In the third phase λ 3 of transmission block t, the mobile base station B received radiofrequency signals as follows: where n B is the AWGN (i.e., n B ∼ CN(0, N 0 ) with zero mean and variance N 0 ) at the mobile base station B, P U is the power domain at UAV U, and α n . The NOMA technique applies superimposed coding by sharing the power domain and therefore, the power allocation strategy strongly affects the success or failure of decoding a message. Previous studies [17,21,34] have also applied power allocation strategies; the current study, however, addresses a WSN divided into multiple subclusters and therefore proposes the novel power allocation strategy described below. Proposition 3. The power allocation strategy for transmitting messages from wireless sensors over UAV transmission period t while the UAV visits subcluster C k is expressed as follows: where a sensor with a higher priority is allocated a larger power allocation factor; for example, sensor S 1 , which has the highest priority and is the first member in subcluster C 3 , is allocated the largest power allocation factor α S (1,3) 1 = 0.1428, whereas sensor S 36 , which has the lowest priority and is the last member in the subcluster C 3 , is allocated the smallest power allocation factor α S (13,3) 36 = 0.011.
For clarity, we applied the power allocation factors presented in Table 6. From Equation (12), the power allocation strategy in the subcluster is constrained such that α  The SINR at the mobile base station B when B decodes message x , and AWGN n B as interference by applying SIC: where i < N k in (13) and i = N k in (14). The maximum instantaneous bit-rate threshold attained if the mobile base station decodes message x S (i,k) n in the best-received signal, given by (11), is expressed as: where ∀x S (i,k) n ∈ X k , and ∀i = {1, . . . , N k }.

System Performance Analysis
In this section, we derive the novel closed-form expressions for the independent outage probability at the UAV and the dependent outage probability at the mobile base station.

3.1.
Outage Probability Performance at the UAV Theorem 1. The independent outage probability at the UAV U relates to the UAV's unsuccessful decoding of the message in the received signal, given by (5). In other words, the maximum instantaneous bit-rate threshold, given by (9), cannot reach the predefined bit-rate threshold R. The independent outage probability at the UAV U in transmission block t is therefore expressed as Based on Equation (16), we propose Algorithm 2 to calculate the Monte Carlo simulations for the outage probability at the UAV U.

Algorithm 2 Calculate the outage probability at the UAV U from (16) for transmission block t.
Input: Initialize the parameters as in Table 1 and randomly generate 10 6 samples of each fading channel over a Rayleigh distribution Output: Simulate (Sim) the results for outage probability at the UAV U in transmission block t 1: for k = 1 to the optimal number of subclusters K do 2: for i = 1 to the number N k of sensor members within the subcluster C k do 3: Calculate the SINR at the UAV from (8); 4: Calculate the achievable maximum instantaneous bit-rate from (9); 5: Initialize variable count ← 0; 6: for l = 1 to 10 6 samples do 7: if min 14: end for 15: return Outage probabilities at UAV OP U (T, N, K); (16), we obtain the outage probability at the UAV over Rayleigh distributions:

Remark 2. From expression
where the SINR threshold is given by γ = 2 2R − 1. It is important to note that Equation (17) obtains the independent outage probabilities at the UAV. Generally, the outage probability at the UAV is calculated from OP U (T, N, See Appendix C for the proof.

Outage Probability at the Mobile Base Station
Theorem 2. The dependent outage event at the mobile base station occurs when the flying relay (FR)-UAV either cannot decode at least the message x S (i,k) n ∈ X k or the mobile base station B cannot decode at least the message x S (i,k) n ∈ X k from the best-received signal y B (T, N, K), given by (11). The outage probability at the mobile base station with an underlying U-assisted multi-input multioutput (MIMO)-NOMA network is therefore expressed as Based on Equation (18), we propose Algorithm 3 to calculate the Monte Carlo simulations for outage probability at the mobile base station for transmission block t over Rayleigh distributions.

Algorithm 3 Calculate the outage probability at the mobile base station from (18) for transmission block t over Rayleigh distributions.
Input: Initialize the parameters as in Table 1 and randomly generate 10 6 samples of each fading channel over a Rayleigh distribution; Output: Simulate (Sim) the results for outage probability at the mobile base station B; 1: for k = 1 to the optimal number K of the subcluster do 2: for i = 1 to the number of sensors N k do 3: Calculate the SINR at the UAV U from (8); 4: Calculate the achievable maximum instantaneous bit-rate at the UAV U from (9); 5: Calculate the minimum-maximum instantaneous bit-rate threshold at the UAV U from (9); 6: Calculate the SINR at the mobile base station from (13) or (14); 7: Calculate the achievable maximum bit-rate at the mobile base station from (15); 8: Calculate the achievable minimum-maximum bit-rate at the mobile base station from (15); 9: Initialize variable count ← 0; 10: for l = 1 to 10 6 samples do 11: if 18: end for 19: return Dependent outage probability at the mobile base station OP B (T, N, K); Remark 3. The outage probability at the mobile base station in transmission block t is given by (18) from Theorem 1 and expressed in novel closed-form as follows: where SINR threshold γ = 2 2R − 1.
See Appendix D for the proof.

Numerical Results and Discussion
In this section, we examine the individual WSN and discuss the results of the study. For the purposes of the analysis and the Monte Carlo simulations, a random number of wireless sensors N was generated and randomly distributed according to the positions il-lustrated in Figure 1. Unless specified otherwise, we assumed that the mobile base station's position was at coordinate B(0, 0, 0) and that the UAV's position U(x, y, 1) determined by K-means clustering had a fixed altitude at z = 1. The number of antennae equipped at the wireless sensors, UAV and mobile base station was A S n = A U = A B = 2. The K-means algorithm determined the optimal number of subclusters as K = 4. The path-loss exponent factor was ε = 4. The list of pairwise distances from each subcluster centroid to the mobile base station was d A2G C 1 ,B = 0.494, d A2G C 2 ,B = 0.8788, d A2G C 3 ,B = 0.9268 and d A2G C 4 ,B = 1.1620. The nearest subcluster to the mobile base station was therefore C 1 . The UAV selected subcluster C 1 for the global period T = {1, 5, 9, 13, . . .}. At global period T = {2, 6, 10, 14, . . .}, the UAV then selected the next nearest subcluster to its current subcluster C 1 , i.e., subcluster C 3 , because distances d A2A 7, 11, 15, . . .}, the UAV again selected the next nearest subcluster to subcluster C 3 , i.e., subcluster C 4 , since distances d A2A 8, 12, 17, . . .}, the UAV selected the next nearest subcluster to subcluster C 4 , i.e., subcluster C 2 , because the final distances d A2A C 4 ,C 2 = 0.5262. The UAV thus selected the shortest trajectory C 1 → C 3 → C 4 → C 2 . Without loss of generality and for simplicity, we assumed that λ 1 = λ 2 = λ 3 = t 3 for a single round trip of the UAV and global period T = {1, 2, 3, 4}.

Numerical Results
Figure 7a-d plot the outage probabilities at the UAV at the point when it decoded the received signals from wireless sensors in subclusters C 1 , C 3 , C 4 and C 2 , respectively. The bit-rate threshold for all wireless sensors was R = 1.5 bps/Hz. The outage probabilities at the majority of wireless sensors were very similar as SNR ρ S n → ∞; however, the graphs in Figure 7a indicate that the outage probability of sensor S 23 was worse than the outage probability at the other sensors of the same subcluster C 1 . Figure 3 indicates that wireless sensor S 23 was the farthest from the subcluster centroid C 1 (d G2A S (7,1) 23 = 1.0479). The results verified the efficiency of the K-means algorithm. It is important that the bit-rate threshold R for the wireless sensors was set to R = 1.5 bps/Hz. However, the UAV successfully decoded most of the messages from the wireless sensors, achieving a high outage probability performance (Figure 7). We conclude that the outage probability performance of the majority of wireless sensors in each subcluster was equal since they were evenly distributed around the subcluster's centroid. The Monte Carlo simulations given by (16) were also verified by the analysis results given by (17).
Next, we examined the results for the mobile base station and obtained its outage probability performance at the points when the UAV visited subclusters C 1 , C 3 , C 4 and C 2 and forwarded the superimposed signals, given by (11), to the base station (Figure 8a-d).
The outage probability performance of the mobile base station was poorer than the outage probability performance at the UAV (Figure 7a-d), even though the bit-rate threshold was set to R = 0.1 bps/Hz. This may have been because the UAV was deployed with NOMA and therefore, the sensors were forced to share the power domain to transmit the messages in the superimposed signal. This means that the last member in the subcluster was allocated a very small power allocation factor, given by (12). These power allocation factors are presented in Table 6. Subcluster C 3 contained N 3 = 13 wireless sensors, and the last wireless sensor in C 3 ( S Therefore, despite a global optimization of the subclusters and the positions of the cluster centroids by the K-means algorithm, the large number of wireless sensors in the subcluster unfortunately led to unsatisfactory results.

Discussion
The outage probability performance at the mobile base station was strongly affected by several factors, such as the UAV's transmit power P U , the distance of the UAV from the mobile base station d U,B and the number N k of messages transmitted in the superimposed signals. The UAV was not able to increase the transmit power P U , however, because of its power limitations. A large number of antennae at both the wireless sensors and the UAV could not be equipped since the constraints for a small size, light weight and low cost did not permit it. It was also not possible to reduce the distance from the UAV to the mobile base station because of obstructions in the terrain. To address these conditions, we equipped a larger number of antennae at the mobile base station, as the mobile base station incorporated a generator and energy was not a significant problem. Therefore, equipping A B = 32 antennae at the mobile base station instead of the same number at the UAV A S n = A U = A B = 2 (Figure 8a-d) yielded the results in Figure 9a-d. It is clear that the outage probability improved significantly at the mobile base station for A B = 32 while A S n = A U = 2. It is also clear that the outage probability performance at the mobile base station when the UAV visited subcluster C 1 improved greatly since this subcluster C 1 was closer than the other subclusters. The outage probabilities at the other subclusters also improved as the SNR ρ U → ∞.

Conclusions
This study presented a general WSN containing a randomly distributed number of wireless sensors with three-dimensional Cartesian coordinates. To improve the WSN's performance, we applied a K-means algorithm and gap statistic method to optimize sensor clustering into a number of subclusters K. The UAV's trajectory was calculated with an algorithm which determined the shortest path between the subcluster centroids. The aims of the study were achieved (i.e., flexible deployment, low cost and high reliability) through the effective proposed solutions, and the results were verified with both Monte Carlo simulations and theoretical analysis. Although the study provided some benefits from the application of the K-means algorithm for wireless sensor clustering, some problems still persisted that can be studied in future work. Future studies can investigate the problems with (1) fragmented power resources created by an imbalance in the number of subcluster sensors and (2) some clusters covering a larger geographic area than others as a result of sparsely distributed sensors. As a potential solution, we propose dividing the network into larger clusters when the number of sensors reaches a certain threshold.

Data Availability Statement:
The data used in this study were randomly generated.

Conflicts of Interest:
The authors declare no conflict of interest. The funders had no role in the design of the study, the collection, analysis and interpretation of data, writing of the manuscript, or the decision to publish the results.
For example, the results in Figure 4 indicate that the UAV's period to visit all subclusters was t = 4; the results in Figure A1d, however, indicate a UAV period of t = 10. We conclude that fewer sensors in each subcluster deliver better results, but at the expense of the UAV consuming more energy to travel longer distances. (c) (d) Figure A1. The randomly distributed WSN (a), determined optimal number of subclusters K (b), division into subclusters (c) and centroid-to-next-nearest-centroid trajectory (d).

Appendix C
By substituting the SINR given by (8) into (9) and then substituting (9) into Theorem 1, as given (16), we thus obtain the independent outage probability at the UAV: From the precoding matrix |H S n ,U | 2 given (2) and the PDF given by (A1), we obtain

Appendix D
Expression (18) is rewritten as follows: By substituting the SINR given by (8) into (9) and then substituting (9)  . By substituting the SINR given by (13) or (14) into (15) and then substituting (15) into ξ, we obtain From expression (A5), we conclude that the outage probability at the mobile base station is independent since it belongs to the outage probability at UAV. The study verified that ϑ < ξ and therefore, the outage probability at the mobile base station given by (A5) refers to OP B (T, N, K) = max{ϑ, ξ} = ξ. Observing the individual WSN model depicted in Figure 4, a possible explanation is that the UAV was close to the wireless sensors, which thus strongly owned the channel state information (CSI) h S n ,U , even though they simply transmitted their own messages x S n under their own power domain P S n . However, the UAV travelled a long distance since the subcluster was far from the mobile base station; therefore, it forwarded messages in the superimposed signal by sharing the power domain P U , and thus OP B (T, N, K) = max{ϑ, ξ} = ξ. To improve OP B (T, N, K), both ϑ and ξ must be improved. To improve ϑ, the number of antennae at the wireless sensors or their transmit power must be increased. However, wireless sensors have certain constraints, such as having a low cost and low power. These solutions are therefore not practical or even obtainable. To address these conditions, we attempted to improve ξ by increasing the number of antennae at the mobile base station (A B = 32). The outage probabilities at the mobile base station improved (Figure 9a-d) over the previous results (Figure 8a-d).